Detecting Wikipedia Vandalism

نویسندگان

  • Tony Jin
  • Lynnelle Ye
  • Hanzhi Zhu
چکیده

Since its inception in 2001, Wikipedia has become the largest encyclopedia ever created in human history. With over 4 million articles in the English edition alone, it has become the highest-traffic educational website on the Internet. It receives over 100,000 edits per day, which can be daunting for human editors to monitor for vandalism, spam, or other inappropriate content. While there are existing vandalism reversion bots, they are generally hard-coded and may not be efficient enough at detecting vandalism. Types of vandalism include insertion of obscenities or personal attacks, deletion of valid content, and intentional introduction of incorrect facts (which can be difficult even for a human to detect). We will experiment with using machine learning techniques to create a vandalism detection bot. We will consider features such as character frequencies, word attributes, attributes of the comment associated with the revision, and the history and attributes of the editor. We will attempt to perform logistic regression and Naive Bayes on these features, and we will also consider training an SVM with them.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Language Models to Detect Wikipedia Vandalism

This paper explores a statistical language modeling approach for detecting Wikipedia vandalism. Wikipedia is a popular and influential collaborative information system. The collaborative nature of authoring, as well as the high visibility of its content, have exposed Wikipedia articles to vandalism, defined as malicious editing intended to compromise the integrity of the content of articles. Ex...

متن کامل

Detecting Vandalism on Wikipedia across Multiple Languages

Vandalism, the malicious modification or editing of articles, is a serious problem for free and open access online encyclopedias such as Wikipedia. Over the 13 year lifetime of Wikipedia, editors have identified and repaired vandalism in 1.6% of more than 500 million revisions of over 9 million English articles, but smaller manually inspected sets of revisions for research show vandalism may ap...

متن کامل

Divide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism

The paper applies knowledge transfer methods to the problem of detecting Wikipedia vandalism detection, defined as malicious editing intended to compromise the integrity of the content of articles. A major challenge of detecting Wikipedia vandalism is the lack of a large amount of labeled training data. Knowledge transfer addresses this challenge by leveraging previously acquired knowledge from...

متن کامل

Detecting Wikipedia Vandalism using WikiTrust

WikiTrust is a reputation system for Wikipedia authors and content. WikiTrust computes three main quantities: edit quality, author reputation, and content reputation. The edit quality measures how well each edit, that is, each change introduced in a revision, is preserved in subsequent revisions. Authors who perform good quality edits gain reputation, and text which is revised by several high-r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012